Add with_output version AppendAttention #3302

Lmywl · 2025-08-11T03:20:55Z

背景：cudagraph 捕获过程中的张量地址管理
目的：将attention模块的输出前置，便于cudagraph捕获时的张量地址处理

paddle-bot · 2025-08-11T03:21:01Z

Thanks for your contribution!

custom_ops/gpu_ops/append_attention.cu

yuanlehome · 2025-08-11T07:06:44Z

fastdeploy/model_executor/layers/attention/append_attn_backend.py

-            self.causal,
-            self.speculative_method is not None,
-        )[0]
+        if self.use_output:


只修改这个文件还不行，全局搜一下所有调用这个append_attention的地方

fastdeploy/model_executor/layers/attention/ops/append_attention.py

fastdeploy/model_executor/layers/attention/append_attn_backend.py

gongshaotian · 2025-08-11T07:16:21Z

麻烦再丰富一下PR描述，说明一下改造的背景、目标

test/layers/test_append_attention.py

custom_ops/gpu_ops/append_attention.cu

fastdeploy/model_executor/layers/attention/append_attn_backend.py

fastdeploy/model_executor/layers/attention/ops/append_attention.py

fastdeploy/model_executor/layers/attention/append_attn_backend.py

Copilot

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

…into append_attn_pr

DrRyanHuang

除了下面俩直接能改的，还有 append_attention 申明为自定义算子的这部分，Outputs 也要改一下
因为咱 append_attention 不是不需要输出 qkv_out 了嘛，所以删掉它

PD_BUILD_STATIC_OP(append_attention)
    .Inputs({"qkv",
    ......
    .Outputs({"fmha_out", "qkv_out", "key_cache_out", "value_cache_out"})  # <--- 这一行
    .SetInplaceMap({{"key_cache", "key_cache_out"},

改成

    .Outputs({"fmha_out", "key_cache_out", "value_cache_out"})

PS: 自定义算子的注册出问题总是直接抛出这种异常，后续主框架也要添加更多上下文信息

terminate called after throwing an instance of 'std:bad_array_new_length'
	what(): std: ： bad_array_new_length

cc @zyfncg @SigureMo

DrRyanHuang · 2025-08-28T04:58:47Z

custom_ops/gpu_ops/cpp_extensions.cc

+    const paddle::Tensor &decoder_tile_ids_per_batch,
+    const paddle::Tensor &decoder_num_blocks,
+    const paddle::Tensor &set_max_lengths, const paddle::Tensor &max_len_kv,
+    paddle::Tensor &res,


Suggested change

paddle::Tensor &res,

paddle::Tensor &fmha_out,

DrRyanHuang · 2025-08-28T05:02:21Z

custom_ops/gpu_ops/append_attention.cu

+    .Attrs({"compute_type: std::string",
+            "cache_quant_type: std::string",
+            "use_neox_rotary_style: bool",
+            "rope_3d: bool",
+            "max_input_length: int",
+            "quant_max_bound: float",
+            "quant_min_bound: float",
+            "out_linear_in_scale: float",
+            "encoder_block_shape_q: int",
+            "decoder_block_shape_q: int",
+            "max_partition_size: int",
+            "encoder_max_partition_size: int",
+            "speculate_max_draft_token_num: int",
+            "causal: bool",
+            "speculate_decoder: bool",
+            "rms_norm_eps: float"})


这里把 rms_norm_eps 的顺序往前移动一下

Suggested change

.Attrs({"compute_type: std::string",

"cache_quant_type: std::string",

"use_neox_rotary_style: bool",

"rope_3d: bool",

"max_input_length: int",

"quant_max_bound: float",

"quant_min_bound: float",

"out_linear_in_scale: float",

"encoder_block_shape_q: int",

"decoder_block_shape_q: int",

"max_partition_size: int",

"encoder_max_partition_size: int",

"speculate_max_draft_token_num: int",

"causal: bool",

"speculate_decoder: bool",

"rms_norm_eps: float"})

.Attrs({"rms_norm_eps: float",

"compute_type: std::string",

"cache_quant_type: std::string",

"use_neox_rotary_style: bool",

"rope_3d: bool",

"max_input_length: int",

"quant_max_bound: float",

"quant_min_bound: float",

"out_linear_in_scale: float",

"encoder_block_shape_q: int",

"decoder_block_shape_q: int",

"max_partition_size: int",

"encoder_max_partition_size: int",

"speculate_max_draft_token_num: int",

"causal: bool",

"speculate_decoder: bool",

})

gongshaotian

LGTM

gongshaotian requested review from yuanlehome and zhoutianzi666 August 11, 2025 06:58

gongshaotian reviewed Aug 11, 2025

View reviewed changes

custom_ops/gpu_ops/append_attention.cu Show resolved Hide resolved

yuanlehome reviewed Aug 11, 2025

View reviewed changes

gongshaotian reviewed Aug 11, 2025

View reviewed changes

fastdeploy/model_executor/layers/attention/ops/append_attention.py Outdated Show resolved Hide resolved

gongshaotian reviewed Aug 11, 2025

View reviewed changes

fastdeploy/model_executor/layers/attention/append_attn_backend.py Show resolved Hide resolved

fastdeploy/model_executor/layers/attention/append_attn_backend.py Show resolved Hide resolved

Lmywl force-pushed the append_attn_pr branch from 68d80b1 to ecbc5fb Compare August 11, 2025 07:37

gongshaotian reviewed Aug 11, 2025

View reviewed changes

test/layers/test_append_attention.py Outdated Show resolved Hide resolved

Lmywl force-pushed the append_attn_pr branch from ecbc5fb to b977c0e Compare August 11, 2025 10:54

lizhenyun01 reviewed Aug 12, 2025

View reviewed changes

custom_ops/gpu_ops/append_attention.cu Show resolved Hide resolved

gongshaotian reviewed Aug 12, 2025

View reviewed changes

fastdeploy/model_executor/layers/attention/append_attn_backend.py Outdated Show resolved Hide resolved

gongshaotian reviewed Aug 12, 2025

View reviewed changes

fastdeploy/model_executor/layers/attention/ops/append_attention.py Show resolved Hide resolved

get use_output from fd_config

8572b8a

Lmywl force-pushed the append_attn_pr branch from 01a0957 to 8572b8a Compare August 12, 2025 07:31

Lmywl added 2 commits August 12, 2025 15:39

add clear TODO description

19109e4

resolve conflict

735299f

Lmywl force-pushed the append_attn_pr branch from 69a7c45 to 735299f Compare August 14, 2025 12:49

Lmywl added 2 commits August 14, 2025 22:13

add mask_offset para to align with develop

233d133

fix bug

892963d

gongshaotian reviewed Aug 18, 2025

View reviewed changes

fastdeploy/model_executor/layers/attention/append_attn_backend.py Outdated Show resolved Hide resolved

Lmywl added 2 commits August 18, 2025 10:58

fix use_output logic

4e82af8

resolve conficts

971f81e

gongshaotian approved these changes Aug 21, 2025

View reviewed changes

YuanRisheng previously approved these changes Aug 21, 2025

View reviewed changes

YuanRisheng added the skip-ci: coverage label Aug 26, 2025

Copilot AI review requested due to automatic review settings August 26, 2025 08:02

Copilot AI reviewed Aug 26, 2025

View reviewed changes

gongshaotian previously approved these changes Aug 26, 2025

View reviewed changes

Merge branch 'develop' of https://github.com/PaddlePaddle/FastDeploy …

22038e6

…into append_attn_pr

DrRyanHuang suggested changes Aug 28, 2025

View reviewed changes

fix sot bug

440a44d

Lmywl dismissed stale reviews from gongshaotian and YuanRisheng via 440a44d August 28, 2025 06:21

Lmywl force-pushed the append_attn_pr branch from ab70c78 to 440a44d Compare August 28, 2025 06:21

DrRyanHuang approved these changes Aug 28, 2025

View reviewed changes

gongshaotian approved these changes Aug 28, 2025

View reviewed changes

gongshaotian merged commit e93d4cf into PaddlePaddle:develop Aug 28, 2025
13 of 16 checks passed

DrRyanHuang mentioned this pull request Aug 28, 2025

[SOT] Eliminate BreakGraph caused by #3302 #3694

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add with_output version AppendAttention #3302

Add with_output version AppendAttention #3302

Uh oh!

Lmywl commented Aug 11, 2025 •

edited

Loading

Uh oh!

paddle-bot bot commented Aug 11, 2025

Uh oh!

Uh oh!

yuanlehome Aug 11, 2025

Uh oh!

Lmywl Aug 11, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gongshaotian commented Aug 11, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

DrRyanHuang left a comment •

edited

Loading

Uh oh!

DrRyanHuang Aug 28, 2025

Uh oh!

DrRyanHuang Aug 28, 2025

Uh oh!

gongshaotian left a comment

Uh oh!

Uh oh!

Uh oh!

Add with_output version AppendAttention #3302

Add with_output version AppendAttention #3302

Uh oh!

Conversation

Lmywl commented Aug 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

paddle-bot bot commented Aug 11, 2025

Uh oh!

Uh oh!

yuanlehome Aug 11, 2025

Choose a reason for hiding this comment

Uh oh!

Lmywl Aug 11, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gongshaotian commented Aug 11, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

DrRyanHuang left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

DrRyanHuang Aug 28, 2025

Choose a reason for hiding this comment

Uh oh!

DrRyanHuang Aug 28, 2025

Choose a reason for hiding this comment

Uh oh!

gongshaotian left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Lmywl commented Aug 11, 2025 •

edited

Loading

DrRyanHuang left a comment •

edited

Loading